Picture for Alina Oprea

Alina Oprea

Beware Untrusted Simulators -- Reward-Free Backdoor Attacks in Reinforcement Learning

Add code
Feb 04, 2026
Viaarxiv icon

Semantics-Preserving Evasion of LLM Vulnerability Detectors

Add code
Jan 30, 2026
Viaarxiv icon

Thought-Transfer: Indirect Targeted Poisoning Attacks on Chain-of-Thought Reasoning Models

Add code
Jan 27, 2026
Viaarxiv icon

Identifying Models Behind Text-to-Image Leaderboards

Add code
Jan 14, 2026
Viaarxiv icon

PoolFlip: A Multi-Agent Reinforcement Learning Security Environment for Cyber Defense

Add code
Aug 27, 2025
Viaarxiv icon

Cascading Adversarial Bias from Injection to Distillation in Language Models

Add code
May 30, 2025
Figure 1 for Cascading Adversarial Bias from Injection to Distillation in Language Models
Figure 2 for Cascading Adversarial Bias from Injection to Distillation in Language Models
Figure 3 for Cascading Adversarial Bias from Injection to Distillation in Language Models
Figure 4 for Cascading Adversarial Bias from Injection to Distillation in Language Models
Viaarxiv icon

R1dacted: Investigating Local Censorship in DeepSeek's R1 Language Model

Add code
May 19, 2025
Viaarxiv icon

ACE: A Security Architecture for LLM-Integrated App Systems

Add code
Apr 29, 2025
Figure 1 for ACE: A Security Architecture for LLM-Integrated App Systems
Figure 2 for ACE: A Security Architecture for LLM-Integrated App Systems
Figure 3 for ACE: A Security Architecture for LLM-Integrated App Systems
Figure 4 for ACE: A Security Architecture for LLM-Integrated App Systems
Viaarxiv icon

SAGA: A Security Architecture for Governing AI Agentic Systems

Add code
Apr 27, 2025
Viaarxiv icon

Quantitative Resilience Modeling for Autonomous Cyber Defense

Add code
Mar 04, 2025
Viaarxiv icon